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ABSTRACT 

Because  there  are  a number  of  tests  for  specification  error  in  detecting  the  errors  of  omitted  variables  or  incorrect 
functional  form,  one  rarely  knows  the  best  test  to  use.  This  paper  compares  the  power  of  the  test  RESET  (regression 
specification  error  test)  to  that  of  Durbin- Watson  in  detecting  the  errors  of  omitted  variables  or  incorrect  functional  form  in 
a regression  analysis  using  the  Bootstrap  method  of  simulation  to  see  which  test  is  better.  The  overall  results  show  that  the 
RESET  is  more  powerful  at  all  sample  sizes  in  detecting  a non  zero  disturbance  mean  (i.e  in  detecting  specification  error) 
as  a result  of  incorrect  functional  forms  or  omitted  variables  in  a regression  model. 
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INTRODUCTION 

The  Encarta  English  Dictionary  (2007)  defined  specification  as  a detailed  description  of  something,  especially  one 
that  provides  information  needed  to  make,  build,  or  produce  something. 

In  regression  analysis  and  related  fields  such  as  econometrics,  specification  is  the  process  of  converting  a theory 
into  a regression  model.  This  process  consists  of  selecting  an  appropriate  functional  form  for  the  model  and  choosing 
which  variables  to  include.  Model  specification  is  one  of  the  first  steps  in  regression  analysis. 

One  of  the  assumptions  of  the  classical  linear  regression  model  is  that  the  regression  model  used  in  the  analysis  is 
correctly  specified.  If  this  fails  to  happen,  then  we  encounter  the  problem  of  specification  error. 

Specification  errors  as  the  name  implies  are  errors  associated  with  the  specification  of  the  model.  These  can  take 
many  forms  such  as  omission  of  a relevant  variable(s),  inclusion  of  an  unnecessary  variable(s),  adopting  the  wrong 
functional  form,  errors  of  measurement  and  incorrect  specification  of  the  stochastic  error  term. 

In  the  inclusion  of  an  irrelevant  variable,  the  presence  of  such  an  error  in  the  specification  of  the  model  does  not 
affect  the  properties  of  OLS  estimators  however;  the  estimates  will  generally  be  inefficient.  It  may  be,  however,  that  the 
included  irrelevant  variable  correlates  with  another  variable  in  the  model,  and  this  will  cause  a fairly  serious  problem  of 
multicollinearity  which  could  result  in  an  unnecessary  increase  in  the  standard  error  of  the  coefficients,  and  so  the  usual  t- 
tests  would  become  unreliable. 

Of  unquestionable  importance  is  the  exclusion  of  a relevant  variable.  This  specification  error  will  affect  the 
properties  of  the  OLS  estimators.  In  the  presence  of  such  an  error,  OLS  estimators  will  be  biased  and  inconsistent,  that  is 
the  bias  will  not  go  away  as  the  sample  size  increases,  since  inconsistency  is  an  asymptotic  property. 


Impact  Factor(JCC):  2.7341-  This  article  can  be  downloaded  from  www.impactjournaIs.us 


10 


Choji  Niri  Martha  & Datong,  Godwin  Monday 


As  mentioned  earlier,  specification  errors  can  also  be  errors  in  the  specification  of  the  functional  form  that  the 
equation  should  take  in  describing  the  relationship  between  the  variables  in  which  we  are  interested.  If  we  estimate  a non- 
linear population  relation  with  a linear  regression  of  sample  data,  then  we  cannot  expect  the  OLS  estimators  to  be  either 
unbiased  or  consistent. 

The  practical  question  is  not  why  specification  errors  are  made  for  they  generally  are  but  how  to  detect  them. 
Because  there  are  a number  of  tests  for  specification  error  in  detecting  the  errors  of  omitted  variables  or  incorrect 
functional  form,  one  rarely  knows  the  best  test  to  use. 

This  project  work  compares  the  power  of  the  test  RESET  (regression  specification  error  test)  to  that  of  Durbin- 
Watson  in  detecting  the  errors  of  omitted  variables  or  incorrect  functional  form  using  the  Bootstrap  method  of  simulation 
to  see  which  test  is  better. 

LITERATURE  REVIEW 

An  economic  investigation  begins  with  the  specification  of  the  econometric  model  underlying  the  phenomenon  of 
interest.  Some  important  questions  that  arise  in  the  specification  of  the  model  include  the  following; 

• What  variable(s)  should  be  included  in  the  model? 

• What  is  the  functional  form  of  the  model?  Is  it  linear  in  the  parameters,  the  variables,  or  both? 

• What  are  the  probabilistic  assumptions  made  about  the  Yn  the  X,  and  the  U,  entering  the  model? 

These  are  very  important  questions.  By  omitting  important  variables  of  the  model,  or  by  choosing  the  wrong 
functional  form,  or  by  making  wrong  stochastic  assumptions  about  the  variables  of  the  model,  the  validity  of  interpreting 
the  estimated  regression  will  be  highly  questionable. 

As  earlier  stated,  we  are  comparing  the  power  of  the  test  RESET  to  that  of  the  Durbin- Watson  in  detecting  the 
errors  of  omitted  variables  or  incorrect  functional  form  in  regression  analysis. 

In  a comparison  between  the  power  of  the  Durbin- Watson  and  the  power  of  the  BLUS  (best  linear  unbiased 
scalar).  Abramhamse  & koerts  (1968)  powers  of  both  tests  were  computed  and  compared.  It  appears  that,  for  the  cases 
considered,  the  power  of  the  Durbin-Watson  exceeds  that  of  the  BLUS. 

Thursby  and  Schmidt  (1977)  in  an  article  titled;  “Some  properties  of  tests  for  specification  error  in  a linear 
regression  model”  considered  the  power  of  a number  of  variants  to  the  test  RESET,  a test  suggested  by  Ramsey  (1969), 
which  is  intended  to  detect  a nonzero  mean  of  the  disturbance  in  a linear  regression.  In  the  test,  they  considered  the 
specification  error  test  with  various  choices  of  test  variables  in  addition  to  those  originally  suggested  by  Ramsey  (1969). 
Analysis  of  an  approximation  to  the  test  statistic’s  distribution  and  the  Monte  Carlo  experiments  reveal  that  the  power  of 
the  test  may  decline  as  the  size  of  the  disturbance  mean  increases.  However,  the  possibility  is  remote  and  declines  with 
increasing  sample  size.  Alternative  sets  of  test  variables  are  considered,  and  their  effects  on  the  power  of  the  test  are 
studied  in  the  Monte  Carlo  experiments.  The  best  set  seems  to  be  composed  of  powers  of  the  explanatory  variables. 
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In  a paper,  a Monte  Carlo  study  of  some  small  sample  properties  of  tests  for  specification  error,  by  Ramsey  and 
Gilbert  (1972)  where  some  tests  for  specification  errors  of  omitted  variables,  incorrect  functional  form,  simultaneous 
equation  problems  and  heteroskedasticity  previously  developed  by  other  author  are  further  considered.  Some  tests  were 
considered;  RESET,  BAMSET  (Barlett’s  M specification  error  test)  and  RASET  (rank  specification  error  test).  In 
comparing  the  relative  sensitivities  of  the  test  statistic  to  various  misspecifications,  one  concludes  that  RESET  is  the 
powerful  of  the  three  tests  against  alternative  Hi. 

Several  tests  for  specification  error  in  a regression  model  have  been  proposed,  and  efforts  have  been  made  to 
show  the  relationship  between  the  tests.  For  example,  Thursby  (1985)  in  a paper,  “The  Relationship  among  the 
Specification  Test  of  Hausman,  Ramsey  and  Chow”  says  that  the  three  tests  are  related.  The  Monte  Carlo  study  of  Thursby 
and  Schmidt  (1977)  indicates  that  the  power  of  RESET  generally  rises  only  slightly  as  a number  of  testvariables  increases 
and,  therefore,  should  be  similar  to  that  of  Hausman’ s tests. 

Furthermore,  Olubusoye  O.  E.  et  el  (2004)  in  a paper  titled  “A  Comparative  Study  of  Some  Specification  Error 
Tests”  compared  the  power  of  RESET,  White  test  and  the  Q-test  in  detecting  specification  errors  arising  from  omitted 
variables,  functional  misspecification  and  contemporaneous  correlation  residuals.  They  concluded  that  RESET  is  the  most 
powerful  test  for  detecting  incorrect  functional  form  and  that  the  test  is  robust  to  autocorrelation  and  heteroscedastic 
disturbances.  The  Q-test  is  most  powerful  in  detecting  autocorrelation  while  White  test  is  used  in  detecting 
heteroscedasticity. 

Finally,  Sapra  (2005)  motivated  by  specification  tests  for  testing  for  functional  and  omitted  variables  in  linear 
regression  model,  has  developed  two  versions  of  the  regression  specification  error  test  (RESET)  for  GLMs  (generalized 
linear  models).  The  tests  when  applied 

METHODOLOGY 

Consider  the  standard  linear  regression  model 

Y = Xp  +U  (3.1) 

Where, 

Y is  an  n x 1 vector  of  dependent  variables 

X is  an  n x k matrix  of  regressors 

B is  a k x 1 vector  of  parameters 

U is  an  n x 1 vector  of  disturbances 

The  null  hypothesis  to  be  tested  is  that  E [U  / X]  = 0 

Where,  U is  normally  distributed  with  covariance  matrix  proportional  to  the  identity  matrix.  The  alternative 
hypothesis  is  that  a specification  error  has  occurred  which  results  in  E [U  / X]  =e  ^ 0 

This  project  used  the  bootstrap  method  of  simulation  to  generate  data  for  the  comparison  of  RESET  and  the 
Durbin- Watson  test. 
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The  table  below  lists  3 models  investigated  in  this  paper.  The  models  were  selected  from  the  models  considered 
by  Thursby  (1979) 

Since  we  are  looking  at  specification  errors  as  a result  of  omitted  variables  or  incorrect  functional  forms,  we 
consider  model  1 which  is  correctly  specified,  model  2 is  the  case  of  incorrect  functional  form  and  model  3 is  the  case  of 
omitted  variable.  Observations  on  the  dependent  variable  are  generated  according  to  one  of  the  specification  labelled  True. 
The  model  labelled  Null  is  then  tested  for  specification  error  at  the  5 percent  level  of  significance. 


Table  1:  The  Models  Considered 


Model 

Specification 

Problem 

1 

True:  Yt  = 10.0  + 5.0Xlt  -2.0X2t  + Ut 
Null  : Y,  = p0  + P,Xlt  + P2X2t  + Ut 

None  (correct  specification) 

2 

True:  Yt  = 1.0  + 2.0Xlt  -0.8X2t  + Ut 
Null : Yt  = p0  + Pie^r/*  + p2e*ir/s  + U( 

Incorrect  functional  form  (additive  effects) 

3 

True:  Yt  = 0.8  - 0.6X6t  + X7t  + 1.5X8t  + Ut 
Null : Yt  = p0  + PiX6t  + P2X7t  + Ut 

Omitted  variable  (R"=  0.96) 

The  Bootstrap  Experiment 

Using  the  typical  bootstrap,  let’s  consider  the  specification  labelled  true  in  model  1. 

Y,  = 10.0  + 5.0Xlt  -2.0X2t  + Ut  (3.2) 

Where,  U~  N (0,  o2)  and  also  satisfies  other  classical  assumptions  of  the  least  squares  estimation.  Numerical 
values  were  assigned  to  all  the  parameters  (P0  = 10.0,  Pi  =5.0,  p2=  -2.0)  in  the  model.  The  variance  a2  was  also  assigned  a 
numerical  value,  and  on  the  basis  of  the  assumed  a2,  the  disturbance  term  U is  generated.  A random  sample  n of  X was 
selected  from  a pool  of  uniformly  distributed  random  numbers  with  interval  (0,  1)  and  the  numerical  values  of  10.0  + 
5.0Xi,  -2.0X2tare  computed.  The  vector  Y was  then  obtained  by  computing  10.0  + 5.0Xi,  -2.0X2t  + Ut.  We  set  sample  sizes 
n = 20,  30  and  50  for  the  purpose  of  the  study.  The  Microsoft  excel  software  was  used  to  generate  the  data. 

Using  a bootstrap  software  package  (in  this  work,  STATA  was  used),  the  X’s  and  the  Y’s  generated  were  copied 
from  Excel  into  STATA  then  bootstrapped  and  replicated  1000  times  using  a STATA  command.  Each  replicate  produces  a 
bootstrap  sample  which  gave  distinct  values  of  Y.  This  leads  to  having  different  estimates  P’s  of  P’s  for  each  bootstrapped 
sample  from  several  regression  of  Y on  fixed  X’s.  The  procedure  described  above  is  then  repeated  for  different  sample 
sizes  n. 

The  above  procedure  was  performed  on  each  of  the  three  models  on  the  tables  above.  The  outcome  of  the 
bootstrap  experiment  was  then  subject  to  analyses  to  compare  the  power  of  RESET  and  Durbin-Watson. 

RESET  (Regression  Specification  Error  Test) 

Ramsey  (1969)  has  argued  that  various  specification  errors  (omitted  variables,  incorrect  functional  form, 
correlation  between  X and  U)  give  rise  to  a nonzero  U vector.  Thus,  the  null  and  alternative  hypotheses  are  restated  as 
follows. 
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H0:  U~  N (0,  o2I) 

H,:  U~  N (U,  o2I)  where,  U + 0 

The  test  of  H0  is  based  on  the  augmented  regression 

Y = Xp  + Za  + U 

The  RESET  procedure  amounts  to  using  the  standard  F-test  to  test  whether  a = 0.  Ramsey’s  suggestion  is  that  Z 
should  contain  powers  of  the  predicted  values  of  the  dependent  variable.  Using  the  second,  third  and  fourth  powers  give 

Z = [Y2  Y3  Y4] 

Where  Y =xp,  and  Y2  = [Y,2  Y22  ...  Yn2  ].  In  the  experiments  we  used  the  square  and  cube  powers  of  the 
predicted  variable  following  Thursby  (1989). 

Using  the  STATA  package,  we  subject  the  result  of  the  Bootstrap  to  analysis  of  the  test  RESET  using  the 
command  ovtest  which  computes  the  Ramsey  RESET  test  using  the  powers  of  the  fitted  values  of  X.  The  idea  behind  the 
ovtest  is  that  it  creates  new  variables  based  on  the  predictors  and  refits  the  model  using  those  new  variables  to  see  if  any  of 
them  would  be  significant. 

Durbin-Watson  Test 

To  use  the  Durbin-Watson  test  for  detecting  model  specification  error(s),  we  proceed  as  follows; 

• From  the  assumed  model,  obtain  the  OLS  residual 

• If  it  is  believed  that  the  assumed  model  is  misspecified  because  it  excludes  a relevant  explanatory  variable,  say,  Z 
from  the  model,  order  the  residuals  obtained  in  step  1 according  to  increasing  values  of  Z. 

Note:  The  Z variable  could  be  one  of  the  X variables  included  in  the  assumed  model  or  it  could  be  some  function 
of  that  variable  such  as  X2and  X3 

• Compute  the  d statistic  from  the  residuals  thus  ordered  by  the  usual  d formula,  namely; 

- fir-ilA 

Note:  That  the  subscript  t is  the  index  of  observation  here  and  does  not  necessarily  mean  that  the  data  are  time 

series. 

From  the  Durbin-Watson  tables,  if  the  estimated  d value  is  significant,  then  one  can  accept  the  hypothesis  of 
model  misspecification.  If  that  turns  out  to  be  the  case,  the  remedial  measures  will  naturally  suggest  themselves. 

Here,  we  also  used  the  STATA  to  order  the  residual  using  the  command  esset  varl  and  subsequently  compute  the 
Dubin  Watson  d statistic  using  the  command  estat  d watson. 

The  Durbin  Watson  statistic  ranges  in  value  from  0-4.  A value  near  2 indicates  non-autocorrelation,  a value 
towards  0 indicates  positive  autocorrelation  and  a value  towards  4 indicates  negative  autocorrelation. 
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The  Power  of  a Test 

The  power  of  a statistical  test  is  the  probability  that  it  will  correctly  lead  to  the  rejection  of  a false  null  hypothesis. 
The  statistical  power  is  the  ability  of  a test  to  detect  an  effect,  if  the  effect  actually  exists.  The  power  may  also  be  defined 
as  1-P,  where  P is  the  probability  of  accepting  a false  null  hypothesis.  Recall  that  accepting  a false  null  hypothesis  is 
referred  to  as  a type  II  error.  High  power  is  always  a desirable  characteristic  of  a test.  In  this  work,  power  is  simply  the 
number  of  times  we  rejected  the  null  hypothesis. 

RESULTS 

The  experimental  results  are  given  in  the  table  below.  The  entries  in  the  table  are  the  percentage  rejections  of  the 
null  hypothesis  of  no  misspecification  (i.e.  the  percentage  power)  of  the  test  RESET  and  the  Durbin  Watson. 


Table  2:  Percentage  Rejections  of  the  Tests 


TESTS 

n 

RESET 

DW 

RESET 

DW 

RESET 

DW 

MODEL  1 

MODEL  2 

MODEL  3 

20 

4.76 

0.0 

61.90 

0.0 

66.67 

4.76 

30 

9.52 

0.0 

85.71 

0.0 

71.43 

0.0 

50 

4.79 

0.0 

80.95 

0.0 

42.86 

14.28 

The  table  above  represents  the  percentage  rejections  of  the  models  showing  the  two  tests  considered  in  the 
analysis  at  different  sample  sizes.  Model  1 represents  the  result  of  the  model  with  correct  specification.  Model  2 represents 
the  result  of  the  model  with  incorrect  functional  form  while  Model  3 represents  the  result  of  the  model  with  omitted 
variable. 


Based  on  the  results,  model  1,  the  case  of  correct  specification,  it  is  obvious  that  the  RESET  is  more  powerful  than 
the  Durbin  Watson  even  though  the  power  of  the  RESET  is  not  strong  because  it  is  a case  of  correct  specification. 
Considering  model  2,  the  case  of  incorrect  functional  form,  we  see  that  the  RESET  exhibit  substantial  power  while  the 
Durbin  Watson  shows  no  power  at  all.  Finally,  looking  at  model  3 the  case  of  omitted  variables,  the  RESET  once  again 
performs  better  than  the  Durbin  Watson  which  shows  little  power. 

CONCLUSIONS 

The  overall  results  show  that  the  RESET  is  more  powerful  at  all  sample  sizes  in  detecting  a non  zero  disturbance 
mean  (i.  e in  detecting  specification  error)  as  a result  of  incorrect  functional  forms  or  omitted  variables  in  a regression 
model. 
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